This tutorial demonstrates how to use MOFA on the bulk multi-omics data set that was used in the first publication of MOFA.
It corresponds to the original MOFA v1 vignette upgraded to MOFA2 code.
Briefly, the data set consists of N=200 samples from Chronic Lymphocytic Leukemia (CLL) patients where four omics were profiled from blood samples: DNA methylation (450K Illumina microarray), bulk RNA-seq, somatic mutations and drug response data from blood for N=200 patients with. The data set is explained in detail in this article and is publicly available here
library(knitr)
library(MOFA2)
library(MOFAdata)
library(data.table)
library(ggplot2)
library(tidyverse)
Data is stored as a list of matrices. Features are stored in the rows and samples in the columns
utils::data("CLL_data")
lapply(CLL_data,dim)
## $Drugs
## [1] 310 200
##
## $Methylation
## [1] 4248 200
##
## $mRNA
## [1] 5000 200
##
## $Mutations
## [1] 69 200
The mRNA expression has been normalised by library size, followed by a variance stabilizing transformation using DESeq2
hist(CLL_data$mRNA)
to.plot <- data.table(
mean = rowMeans(CLL_data$mRNA,na.rm=T),
sd = apply(CLL_data$mRNA,1,sd,na.rm=T)
)
ggplot(to.plot, aes(x=mean, y=sd)) +
geom_point() +
stat_smooth(method="loess") +
theme_classic()
## `geom_smooth()` using formula 'y ~ x'
DNA methylation is calculated for every CpG site using the M-value, which provides a better summary statistic for downstream analysis.
Notice that we use only the top 1% (N=4248) most variable sites.
hist(CLL_data$Methylation)
In this study they have measured the effect of relevant drugs ex vivo using a high-throughput assay. For each drug they have measured 5 different concentrations from (XX_5 to XX_1, from lowest to highest). The value reported is the viability score (0=all cells died, 1=no cells died).
hist(CLL_data$Drugs)
Mutations are assessed in a panel of common cancer mutations and are summarised in a binary format:
table(CLL_data$Mutations)
##
## 0 1
## 8474 667
Notice that most mutations are not frequent and we will likely not have enough power to do good statistical inference with them:
to.plot <- data.table(
mutation = rownames(CLL_data$Mutations),
frequency = rowMeans(CLL_data$Mutations,na.rm=T)
)
ggplot(to.plot, aes(x=reorder(mutation,-frequency), y=frequency)) +
geom_bar(stat = "identity") +
theme_classic() +
labs(x="", y="Frequency") +
theme(
axis.text.x = element_text(colour="black",size=rel(0.7), angle=90, hjust=1, vjust=0.5),
)
Load sample metadata as a data.frame. Important columns are:
CLL_metadata <- fread("http://ftp.ebi.ac.uk/pub/databases/mofa/cll_vignette/sample_metadata.txt")
head(CLL_metadata) %>% kable# %>% kable_styling(bootstrap_options = c("striped", "hover"))
| sample | Gender | age | TTT | TTD | treatedAfter | died | IGHV | trisomy12 |
|---|---|---|---|---|---|---|---|---|
| H005 | m | 75.26575 | 0.5749487 | 2.625599 | TRUE | FALSE | 1 | 0 |
| H006 | m | NA | NA | NA | NA | NA | NA | NA |
| H007 | f | NA | NA | NA | NA | NA | NA | NA |
| H008 | m | NA | NA | NA | NA | NA | NA | NA |
| H010 | f | 72.78082 | 2.9322382 | 2.932238 | FALSE | FALSE | 0 | 0 |
| H011 | f | 72.99452 | 0.0191650 | 2.951403 | TRUE | FALSE | 1 | 0 |
Create the MOFA object
MOFAobject <- create_mofa(CLL_data)
MOFAobject
## Untrained MOFA model with the following characteristics:
## Number of views: 4
## Views names: Drugs Methylation mRNA Mutations
## Number of features (per view): 310 4248 5000 69
## Number of groups: 1
## Groups names: group1
## Number of samples (per group): 200
Visualise the number of views (rows) and the number of groups (columns) exist, what are their corresponding dimensionalities and how many missing information they have (grey bars).
plot_data_overview(MOFAobject)
Important arguments:
FALSEFALSEdata_opts <- get_default_data_options(MOFAobject)
data_opts
## $scale_views
## [1] FALSE
##
## $scale_groups
## [1] FALSE
##
## $views
## [1] "Drugs" "Methylation" "mRNA" "Mutations"
##
## $groups
## [1] "group1"
Important arguments:
model_opts <- get_default_model_options(MOFAobject)
model_opts$num_factors <- 15
model_opts
## $likelihoods
## Drugs Methylation mRNA Mutations
## "gaussian" "gaussian" "gaussian" "bernoulli"
##
## $num_factors
## [1] 15
##
## $spikeslab_factors
## [1] FALSE
##
## $spikeslab_weights
## [1] TRUE
##
## $ard_factors
## [1] FALSE
##
## $ard_weights
## [1] TRUE
Important arguments:
train_opts <- get_default_training_options(MOFAobject)
train_opts$convergence_mode <- "medium"
train_opts$seed <- 42
train_opts
## $maxiter
## [1] 1000
##
## $convergence_mode
## [1] "medium"
##
## $drop_factor_threshold
## [1] -1
##
## $verbose
## [1] FALSE
##
## $startELBO
## [1] 1
##
## $freqELBO
## [1] 1
##
## $stochastic
## [1] FALSE
##
## $gpu_mode
## [1] FALSE
##
## $seed
## [1] 42
Prepare the MOFA object
MOFAobject <- prepare_mofa(MOFAobject,
data_options = data_opts,
model_options = model_opts,
training_options = train_opts
)
Train the model: this should take ~5min
# NOTE: The software has evolved since the original publication and the results will not be 100% identical to the original publication, please use the pretrained model if you are running through the vignette for the fist time
# MOFAobject <- run_mofa(MOFAobject, outfile="/Users/ricard/Downloads/MOFA2_CLL.hdf5")
# saveRDS(MOFAobject,"MOFA2_CLL.rds")
# Load precomputed model
MOFAobject <- readRDS(url("http://ftp.ebi.ac.uk/pub/databases/mofa/cll_vignette/MOFA2_CLL.rds"))
The MOFA object consists of multiple slots where relevant data and information is stored. For descriptions, you can read the documentation using ?MOFA. The most important slots are:
slotNames(MOFAobject)
## [1] "data" "intercepts" "imputed_data"
## [4] "samples_metadata" "features_metadata" "expectations"
## [7] "training_stats" "training_options" "stochastic_options"
## [10] "data_options" "model_options" "dimensions"
## [13] "on_disk" "dim_red" "cache"
## [16] "status"
Data:
names(MOFAobject@data)
## [1] "Drugs" "Methylation" "mRNA" "Mutations"
dim(MOFAobject@data$Drugs$group1)
## [1] 310 200
Factor and Weight values (expectations of the posterior distributions):
names(MOFAobject@expectations)
## [1] "Z" "W"
# Dimensionality of the factor matrix: 200 samples, 15 factors
dim(MOFAobject@expectations$Z$group1)
## [1] 200 15
# Dimensionality of the mRNA Weight matrix: 5000 features, 15 factors
dim(MOFAobject@expectations$W$mRNA)
## [1] 5000 15
updated_features_names <- features_names(MOFAobject)
Load drug metadata
drug_metadata <- fread("http://ftp.ebi.ac.uk/pub/databases/mofa/cll_vignette/drugs.txt.gz")
head(drug_metadata) %>% kable# %>% kable_styling(bootstrap_options = c("striped", "hover"))
| drug_id | name | main_targets | target_category | pathway |
|---|---|---|---|---|
| D_001 | navitoclax | BCL2, BCL-XL, BCL-W | Apoptosis (BH3, Survivin) | Apoptosis (BH3) |
| D_002 | ibrutinib | BTK | B-cell receptor | B-cell receptor |
| D_003 | idelalisib | PI3K delta | PI3K/AKT | B-cell receptor |
| D_004 | SNS-032 | CDK2/7/9 | Cell cycle control | Other |
| D_005 | olaparib | PARP1/2 | DNA damage response | Other |
| D_006 | fludarabine | Purine analogue | DNA damage response | Other |
tmp <- drug_metadata$name; names(tmp) <- drug_metadata$drug_id
updated_features_names[["Drugs"]] <- stringr::str_replace_all(features_names(MOFAobject)[["Drugs"]], tmp)
Notice that the genes in the input data are encoded as ENSEMBL IDs:
head(features_names(MOFAobject)[["mRNA"]])
## [1] "ENSG00000244734" "ENSG00000158528" "ENSG00000198478" "ENSG00000175445"
## [5] "ENSG00000174469" "ENSG00000188536"
Keep the model with the ENSEMBL IDs for the gene set enrichment analysis section
MOFAobject.ensembl <- MOFAobject
Load gene metadata
gene_metadata <- fread("http://ftp.ebi.ac.uk/pub/databases/mofa/cll_vignette/Hsapiens_genes_BioMart.87.txt.gz")
gene_metadata[,symbol:=ifelse(symbol=="",ens_id,symbol)]
tmp <- gene_metadata$symbol; names(tmp) <- gene_metadata$ens_id
Rename and avoid duplicated names with the Mutations view
features_names(MOFAobject) <- updated_features_names
The sample metadata must be provided as a data.frame and it must contain a column sample with the sample IDs. Make sure that the samples in the metadata match the samples in the model
samples_metadata(MOFAobject) <- CLL_metadata
A good sanity check is to verify that the Factors are largely uncorrelated. In MOFA there are no orthogonality constraints such as in Principal Component Analysis, but if there is a lot of correlation between Factors this suggests a poor model fit. Reasons? Perhaps you used too many factors or perhaps the normalisation is not adequate.
plot_factor_cor(MOFAobject)
The most important insight that MOFA generates is the variance decomposition analysis. This plot shows the percentage of variance explained by each factor across each data modality (and group, if provided). It summarises the sources of variation from a complex heterogeneous data set in a single figure.
plot_variance_explained(MOFAobject, max_r2=10)
What insights from the data can we learn just from inspecting this plot?
(Q) Based on the MOFA output, if you were to profile just one molecular layer, which one would you choose to maximise the amount of sources of variation captured?
There are a few systematic strategies to characterise the molecular signal that underlies each MOFA Factor and to relate them to existent sample covariates:
Let’s test the association between the MOFA factors versus Gender and age:
correlate_factors_with_covariates(MOFAobject,
covariates = c("Gender","age","died"),
plot="log_pval"
)
## Warning in correlate_factors_with_covariates(MOFAobject, covariates =
## c("Gender", : There are non-numeric values in the covariates data.frame,
## converting to numeric...
Most Factors don’t have a clear association with any of the covariates. Only Factor 11 has a (weak) association with survival outcome. We will explore association with clinical measurements later in the vignette.
How do we interpret the factor values?
Each factor captures a different source of variability in the data. Mathematically, each Factor is defined by a linear combination of the input features. As the data is centered prior to running MOFA, each Factor ordinates cells along a one-dimensional axis that is centered at zero. Samples with different signs manifest opposite phenotypes along the inferred axis of variation, with higher absolute value indicating a stronger effect. Note that the interpretation of MOFA factors is analogous to the interpretation of the principal components in PCA.
plot_factor(MOFAobject,
factors = 1,
color_by = "Factor1"
)
How do we interpret the weights?
The weights provide a score for each feature on each factor. Features with no association with the corresponding factor are expected to have values close to zero, whereas features with strong association with the factor are expected to have large absolute values. The sign of the weights indicates the direction of the effect: a positive weights indicates that the feature has higher levels in the cells with positive factor values, and vice-versa.
By looking at the variance explained plot, we saw that Factor 1 captures variation in all data modalities. Out of all omics, the somatic mutation data is a good place to start, as somatic mutations are very sparse, easy to interpret and any change in the DNA is likely to have downstream consequences to all other molecular layers. Let’s plot the weights:
plot_weights(MOFAobject,
view = "Mutations",
factor = 1,
nfeatures = 10, # Top number of features to highlight
scale = T # Scale weights from -1 to 1
)
Notice that most features lie at zero, indicating that most features have no association with Factor 1. There is however one gene that clearly stands out: IGHV (immunoglobulin heavy chain variable region). This is the main clinical marker for CLL.
An alternative visualisation to the full distribution of weights is to do a line plot that displays only the top features with the corresponding weight sign on the right:
plot_top_weights(MOFAobject,
view = "Mutations",
factor = 1,
nfeatures = 10, # Top number of features to highlight
scale = T # Scale weights from -1 to 1
)
IGHV has a positve weight. This means that samples with positive Factor 1 values have the IGHV mutation whereas samples with negative Factor 1 values do not have the IGHV mutation. To confirm this, let’s plot the Factor values and colour the IGHV mutation status.
plot_factor(MOFAobject,
factors = 1,
color_by = "IGHV",
add_violin = TRUE,
dodge = TRUE,
show_missing = FALSE
)
We can also plot Factor values coloured by other covariates, for example Gender. As concluded from the association analysis above, this variable has no association with Factor 1:
plot_factor(MOFAobject,
factors = 1,
color_by = "Gender",
dodge = TRUE,
add_violin = TRUE
)
From the variance explained plot we know that Factor 1 drives variation across all data modalities. Let’s visualise the mRNA expression changes that are associated with Factor 1:
plot_weights(MOFAobject,
view = "mRNA",
factor = 1,
nfeatures = 10
)
In this case we have a large amount of genes that have large positive and negative weights. Genes with large positive values will be more expressed in the samples with IGHV mutation, whereas genes with large negative values will be more expressed in the samples without the IGHV mutation. Let’s verify this. The function plot_data_scatter generates a scatterplot of Factor 1 values (x-axis) versus expression values (y-axis) for the top 4 genes with largest positive weight. Samples are coloured by IGHV status:
plot_data_scatter(MOFAobject,
view = "mRNA",
factor = 1,
features = 4,
sign = "positive",
color_by = "IGHV"
) + labs(y="RNA expression")
This function generates a scatterplot of Factor 1 values (x-axis) versus expression values (y-axis) for the top 4 genes with largest negative weight. Samples are coloured by IGHV status:
plot_data_scatter(MOFAobject,
view = "mRNA",
factor = 1,
features = 4,
sign = "negative",
color_by = "IGHV"
) + labs(y="RNA expression")
An alternative visualisation is to use a heatmap
plot_data_heatmap(MOFAobject,
view = "mRNA",
factor = 1,
features = 25,
cluster_rows = FALSE, cluster_cols = FALSE,
show_rownames = TRUE, show_colnames = FALSE,
scale = "row"
)
plot_data_heatmap has an interesting argument to “beautify” the heatmap: denoise = TRUE. Instead of plotting the (noisy) input data, we can plot the data reconstructed by the model, where noise has been removed:
plot_data_heatmap(MOFAobject,
view = "mRNA",
factor = 1,
features = 25,
denoise = TRUE,
cluster_rows = FALSE, cluster_cols = FALSE,
show_rownames = TRUE, show_colnames = FALSE,
scale = "row"
)
(Q) Can you suggest new RNA expression and DNA methylation markers for personalised treatment recommendations (depending on the IGHV status? First explore the MOFA weights, then go back to the input data and do boxplots using the chosen markers (x-axis being the IGHV status and y-axis being the marker’s expression or methylation values). The section Customized analysis may be helpful.
(Q) Your task is to provide a characterisation for Factor 3. Try a similar pipeline as for Factor 1 and answer the following questions: - Which mutation underlies Factor 3? - Can you identify mRNA markers? - Can you explore the methylation signal for this Factor? - Do a (small) bibliographical search to check if your predictions make sense
Now that we have characterised the etiology of the two main Factors, let’s explore them together:
p <- plot_factors(MOFAobject,
factors = c(1,3),
color_by = "IGHV",
shape_by = "trisomy12",
dot_size = 2.5,
show_missing = T
)
p <- p +
geom_hline(yintercept=-1, linetype="dashed") +
geom_vline(xintercept=(-0.5), linetype="dashed")
print(p)
## Warning: Removed 27 rows containing missing values (geom_point).
(Q) Why is this plot important for personalised medicine?
The scatterplot of Factor 1 vs Factor 3 reveals that a few samples are missing the somatic mutation status. In this case, the doctors were not able to classify patients into their clinical subgroups. But we can now use MOFA to exploit the molecular profiles and attempt to impute the IGHV and trisomy12 status.
library(randomForest)
# Prepare data
df <- as.data.frame(get_factors(MOFAobject, factors=c(1,2))[[1]])
# Train the model for IGHV
df$IGHV <- as.factor(MOFAobject@samples_metadata$IGHV)
model.ighv <- randomForest(IGHV ~ ., data=df[!is.na(df$IGHV),], ntree=10)
df$IGHV <- NULL
# Do predictions
MOFAobject@samples_metadata$IGHV.pred <- stats::predict(model.ighv, df)
# Train the model for Trisomy12
df$trisomy12 <- as.factor(MOFAobject@samples_metadata$trisomy12)
model.trisomy12 <- randomForest(trisomy12 ~ ., data=df[!is.na(df$trisomy12),], ntree=10)
df$trisomy12 <- NULL
MOFAobject@samples_metadata$trisomy12.pred <- stats::predict(model.trisomy12, df)
Plot predictions for IGHV
MOFAobject@samples_metadata$IGHV.pred_logical <- c("True","Predicted")[as.numeric(is.na(MOFAobject@samples_metadata$IGHV))+1]
p <- plot_factors(MOFAobject,
factors = c(1,3),
color_by = "IGHV.pred",
shape_by = "IGHV.pred_logical",
dot_size = 2.5,
show_missing = T
)
p <- p +
geom_hline(yintercept=-1, linetype="dashed") +
geom_vline(xintercept=(-0.5), linetype="dashed")
print(p)
In addition to exploring the individual weights for each factor, we can use enrichment analysis to look for signiificant associations of factors to genesets. Here, we use the Reactome genesets for illustrations, which is contained in the MOFAdata package. For more details on how the GSEA works we encourage the users to read the GSEA vignette
Gene set annotations are provided as a binary membership matrix. Genes are stored in the rows, pathways are stored in the columns. A value of 1 indicates that gene \(j\) belongs to the pathway \(i\).
utils::data(reactomeGS) # from MOFAdata
head(colnames(reactomeGS))
## [1] "ENSG00000187634" "ENSG00000188976" "ENSG00000187961" "ENSG00000187583"
## [5] "ENSG00000187642" "ENSG00000188290"
head(rownames(reactomeGS))
## [1] "Interleukin-6 signaling"
## [2] "Apoptosis"
## [3] "Hemostasis"
## [4] "Intrinsic Pathway for Apoptosis"
## [5] "Cleavage of Growing Transcript in the Termination Region "
## [6] "PKB-mediated events"
These are the steps for doing Gene Set Enrichment Analysis (GSEA) with MOFA:
j belongs to pathway i. A value of 0 indicates elsewise.mean.diff (difference in the average weight between foreground and background genes) or rank.sum (difference in the sum of ranks between foreground and background genes).parametric (a simple and very liberal parametric t-test), cor.adj.parametric (parametric t-test adjusted by the correlation between features), permutation (unparametric, the null distribution is created by permuting the weights. This option is computationally expensive, but it preserves the correlation structure between features in the data.).An important consideration when running GSEA is that MOFA contains positive and negative weights. There will be cases where the genes with negative weights all belong to a specific pathway but genes with positive weights belong to other pathways. If this is true, doing GSEA with all of them together could dilute the signal. Hence, we recommend the user to do GSEA separately for (+) and (-) weights, and possibly also jointly with all weights.
# GSEA on positive weights, with default options
res.positive <- run_enrichment(MOFAobject.ensembl,
feature.sets = reactomeGS,
view = "mRNA",
sign = "positive"
)
# GSEA on negative weights, with default options
res.negative <- run_enrichment(MOFAobject.ensembl,
feature.sets = reactomeGS,
view = "mRNA",
sign = "negative"
)
The enrichment analysis returns a list of 5 elements:
names(res.positive)
## [1] "feature.sets" "pval" "pval.adj"
## [4] "feature.statistics" "set.statistics" "sigPathways"
Plot an overview of the number of significant pathways per factor.
It seems that most of the Factors do not have clear gene set signatures. A clear exception is Factor 5, which has a very strong enrichment for genes with positive weights.
plot_enrichment_heatmap(res.positive)
plot_enrichment_heatmap(res.negative)
(Q) Can you characterise Factor 5 based on the GSEA results?
Hint: use the functions plot_enrichment and plot_enrichment_detailed.
For customized exploration of weights and factors, you can directly fetch the variables from the model using ‘get’ functions: get_weights, get_factors and get_data:
weights <- get_weights(MOFAobject,
views = "all",
factors = "all",
as.data.frame = TRUE
)
head(weights)
## feature factor value view
## 1 navitoclax_1 Factor1 -0.0001029638 Drugs
## 2 navitoclax_2 Factor1 -0.0061835683 Drugs
## 3 navitoclax_3 Factor1 -0.0484350338 Drugs
## 4 navitoclax_4 Factor1 -0.0396829204 Drugs
## 5 navitoclax_5 Factor1 -0.0240856562 Drugs
## 6 ibrutinib_1 Factor1 -0.0004036970 Drugs
factors <- get_factors(MOFAobject,
factors = "all",
as.data.frame = TRUE
)
head(factors)
## sample factor value group
## 1 H045 Factor1 -1.418579 group1
## 2 H109 Factor1 -1.415237 group1
## 3 H024 Factor1 1.663551 group1
## 4 H056 Factor1 1.224276 group1
## 5 H079 Factor1 -1.217420 group1
## 6 H164 Factor1 -1.791768 group1
data <- get_data(MOFAobject,
views = "all",
as.data.frame = TRUE
)
head(data)
## view group feature sample value
## 1 Drugs group1 navitoclax_1 H045 0.02363938
## 2 Drugs group1 navitoclax_2 H045 0.04623274
## 3 Drugs group1 navitoclax_3 H045 0.31874706
## 4 Drugs group1 navitoclax_4 H045 0.82370272
## 5 Drugs group1 navitoclax_5 H045 0.89627769
## 6 Drugs group1 ibrutinib_1 H045 0.09432807
The factors inferred by MOFA can be related to clinical outcomes such as time to treatment or survival times. As this type of data is censored (not for all samples we have already observed the event) we will use Cox models for this purpose. In a Cox proportional hazards model we model the hazard of an event ocurring (e.g. death or treatment) as a function of some covariates (here the factors). If a factor has a influence on the surivival time or time to treatment it will receive a high absoulte coefficient in the Cox model. In particular:
To fit these models we will use the coxph function in the survival package. The survival data is stored in a survival object that contains both the time a sample has been followed up and whether the event has occured (as 0,1).
Let’s take time to treatment as an example here. The sample metadata contains the follow-up times per sample in years in the column TTT, and the column treatedAfter indicated whether a treatment occured.
library(survival)
library(survminer)
SurvObject <- Surv(MOFAobject@samples_metadata$TTT, MOFAobject@samples_metadata$treatedAfter)
Z <- get_factors(MOFAobject)[[1]]
fit <- coxph(SurvObject ~ Z)
fit
## Call:
## coxph(formula = SurvObject ~ Z)
##
## coef exp(coef) se(coef) z p
## ZFactor1 -0.43422 0.64777 0.10944 -3.968 7.26e-05
## ZFactor2 0.53961 1.71534 0.13380 4.033 5.51e-05
## ZFactor3 0.10059 1.10582 0.14057 0.716 0.474259
## ZFactor4 0.04846 1.04965 0.10528 0.460 0.645339
## ZFactor5 0.27567 1.31741 0.11328 2.434 0.014952
## ZFactor6 -0.25121 0.77786 0.12383 -2.029 0.042500
## ZFactor7 0.23077 1.25957 0.11332 2.036 0.041709
## ZFactor8 0.40083 1.49306 0.10826 3.702 0.000214
## ZFactor9 -0.04353 0.95740 0.10580 -0.411 0.680751
## ZFactor10 -0.12880 0.87915 0.10962 -1.175 0.240011
## ZFactor11 0.54195 1.71935 0.13455 4.028 5.63e-05
## ZFactor12 -0.80444 0.44734 0.12448 -6.463 1.03e-10
## ZFactor13 -0.12435 0.88307 0.11611 -1.071 0.284175
## ZFactor14 -0.05307 0.94831 0.08940 -0.594 0.552783
## ZFactor15 0.03688 1.03757 0.12081 0.305 0.760168
##
## Likelihood ratio test=114.6 on 15 df, p=< 2.2e-16
## n= 174, number of events= 96
## (26 observations deleted due to missingness)
We can see that several factors have a significant association to time to treatment. For example, Factor 1 has a negative coefficient. Samples with low factor values have an increased hazard compared to samples with a large factor values.
(Q) Which Factors are associated with the clinical covariate (time to next treatment)?
s <- summary(fit)
coef <- s[["coefficients"]]
df <- data.frame(
factor = factor(rownames(coef), levels = rev(rownames(coef))),
p = coef[,"Pr(>|z|)"],
coef = coef[,"exp(coef)"],
lower = s[["conf.int"]][,"lower .95"],
higher = s[["conf.int"]][,"upper .95"]
)
ggplot(df, aes(x=factor, y=coef, ymin=lower, ymax=higher)) +
geom_pointrange() +
coord_flip() +
scale_x_discrete() +
labs(y="Hazard Ratio", x="") +
geom_hline(aes(yintercept=1), linetype="dotted") +
theme_bw()
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Catalina 10.15.3
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] survminer_0.4.6 ggpubr_0.2.5.999 survival_3.1-11
## [4] randomForest_4.6-14 forcats_0.5.0 stringr_1.4.0
## [7] dplyr_0.8.5 purrr_0.3.3 readr_1.3.1
## [10] tidyr_1.0.2 tibble_3.0.0 tidyverse_1.3.0
## [13] ggplot2_3.3.0 data.table_1.12.8 MOFAdata_1.0.0
## [16] MOFA2_0.99.7 knitr_1.28 BiocStyle_2.14.4
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.4-1 ggsignif_0.6.0 ellipsis_0.3.0
## [4] rio_0.5.16 fs_1.4.0 rstudioapi_0.11
## [7] farver_2.0.3 ggrepel_0.8.2 fansi_0.4.1
## [10] lubridate_1.7.4 xml2_1.3.0 splines_3.6.3
## [13] R.methodsS3_1.8.0 mnormt_1.5-6 jsonlite_1.6.1
## [16] km.ci_0.5-2 broom_0.5.5 dbplyr_1.4.2
## [19] R.oo_1.23.0 pheatmap_1.0.12 HDF5Array_1.14.3
## [22] BiocManager_1.30.10 compiler_3.6.3 httr_1.4.1
## [25] backports_1.1.6 assertthat_0.2.1 Matrix_1.2-18
## [28] cli_2.0.2 htmltools_0.4.0 tools_3.6.3
## [31] gtable_0.3.0 glue_1.4.0 reshape2_1.4.3
## [34] Rcpp_1.0.4 carData_3.0-3 cellranger_1.1.0
## [37] vctrs_0.2.4 nlme_3.1-145 psych_1.9.12.31
## [40] xfun_0.12 openxlsx_4.1.4 rvest_0.3.5
## [43] lifecycle_0.2.0 rstatix_0.5.0 zoo_1.8-7
## [46] scales_1.1.0 hms_0.5.3 parallel_3.6.3
## [49] rhdf5_2.30.1 RColorBrewer_1.1-2 yaml_2.2.1
## [52] curl_4.3 reticulate_1.15 gridExtra_2.3
## [55] KMsurv_0.1-5 stringi_1.4.6 highr_0.8
## [58] S4Vectors_0.24.3 corrplot_0.84 BiocGenerics_0.32.0
## [61] zip_2.0.4 BiocParallel_1.20.1 rlang_0.4.5
## [64] pkgconfig_2.0.3 matrixStats_0.56.0 evaluate_0.14
## [67] lattice_0.20-41 Rhdf5lib_1.8.0 labeling_0.3
## [70] cowplot_1.0.0 tidyselect_1.0.0 plyr_1.8.6
## [73] magrittr_1.5 bookdown_0.18 R6_2.4.1
## [76] IRanges_2.20.2 generics_0.0.2 DelayedArray_0.12.2
## [79] DBI_1.1.0 pillar_1.4.3 haven_2.2.0
## [82] foreign_0.8-76 withr_2.1.2 mgcv_1.8-31
## [85] abind_1.4-5 modelr_0.1.6 crayon_1.3.4
## [88] car_3.0-7 survMisc_0.5.5 rmarkdown_2.1
## [91] grid_3.6.3 readxl_1.3.1 reprex_0.3.0
## [94] digest_0.6.25 xtable_1.8-4 R.utils_2.9.2
## [97] stats4_3.6.3 munsell_0.5.0